The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci.
نویسندگان
چکیده
For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
منابع مشابه
Differential and coherent processing patterns from small RNAs
Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed ...
متن کاملMycobacterium avium subsp. paratuberculosis induces differential cytosine methylation at miR-21 transcription start site region
Mycobacterium aviumsubspecies paratuberculosis (MAP), as an obligate intracellular bacterium, causes paratuberculosis (Johne’s disease) in ruminants. Plus, MAP has consistently been isolated from Crohn’s disease (CD) lesions in humans; a notion implying possible direct causative ...
متن کاملGenetic Polymorphisms within The Intronless ACTL7A and ACTL7B Genes Encoding Spermatogenesis-Specific Actin-Like Proteins in Japanese Males
Actins play essential roles in cellular morphogenesis. In mice, the t-actin 1 and 2 genes, which encode actin-like proteins, are specifically expressed in haploid germ cells. Both T-ACTIN 1/ACTLB and T-ACTIN 2/ACTL7A have also been cloned as orthologous genes in humans; they are present on chromosome 9q31.3 as intronless genes. Defects of germ cell-specific genes can introduce infertility witho...
متن کاملIntegrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome.
The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of th...
متن کاملThe Expression of T-Helper Associated Transcription Factors and Cytokine Genes in Pre-Eclampsia
Background: Pre-eclampsia (PE) is known as a main factor contributing to fetomaternal mortality, which might affect 2-8% of all pregnancies after the twentieth week of gestation. The balance of T helper subsets is essential to sustain a normal pregnancy and preventing fetomaternal complications. Objective: To investigate differences in the levels of transcription factors and cytokine gene e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 17 6 شماره
صفحات -
تاریخ انتشار 2007